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ABSTRACT 


The growth of the internet and digital data has resulted forgery, modification 
and sharing of digital data without property rights. Audio watermarking is 
one of a solution to protect the copyright of an audio from copyright 
infringement. This paper proposes an audio watermarking method which is 
robust against attacks and high capacity. First, a synchronization bit is added 
to the audio host. After the audio host is decomposed by Lifting Wavelet 
Transform (LWT), then choose a subband from the output of LWT to be 
transformed by discrete cosine transform (DCT). Next, the matrix of 
the signal from DCT is selected for the singular value decomposition (SVD) 
process, so that is obtained U, S and V matrix. S matrix is embedded with 
the watermark. Before the embedding process, the watermark image is 
compressed by Compressive Sampling. The results show that the proposed 
watermarking system is highly robust against a kind attack of LPF, 
resampling, and linear speed change which is proven by its BER is zero. 


This is an open access article under the CC BY-SA license. 


Corresponding Author: 


Ledya Novamizanti, 

School of Electrical Engineering, 

Telkom University, 

Telekomunikasi St., Terusan Buah Batu, Bandung 40257, Indonesia. 
Email: ledyaldn@telkomuniversity.ac.id 


1. INTRODUCTION 

Information and communication technologies are growing rapidly, indicated with a lot of data traffic 
on the network internet. The internet can be found very easily by users and even become a daily need. 
The easiness of using it causes the internet as a source that used for accessing any kind of information 
and digital contents that makes the spreading of music, song and other audios file can’t be controlled. 
It becomes special attention for certain parties because it can lead to losses such as illegal distribution, 
copyright infringement, and illegitimate sharing. It has become a necessity to maintain the intellectual 
characteristics of digital content from certain media. Watermarking becomes one of the solutions to solve 
the problem. Watermarking is a signal or a digital pattern that is embedded into digital images, video, 
or audio [1]. The result of the watermarking does not always perfect as expected. In general, it happened 
because of noise that can interfere or change the audio file. In this study proposes a watermarking scheme 
on audio. 

There are several studies related to audio watermarking using one transformation method. 
Embedded robust watermark on an audio signal based on the LWT method can resist kind of attacks, such as 
low-pass filter, Gaussian noise, resampling, salt and pepper noise, and compression. This method can secure 
the copyright and generate the integrity of audio signal [2]. Dhar and Simamora [3] proposed LWT method is 
combined with SVD and Fast Walsh Hadamard Transform that produce some analysis showed that those 
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methods provide a high resilience against attacks including re-sampling, noise, compression on MP3 
and cropping. However, the resulting audio quality is not optimal. The next related research is about the DCT 
method. The result of the research is the method [4] has a quite high resilience against MP3 compression 
attack. Another related study is the Lifting wavelet method that has a deficiency for self-adaptability, 
this scheme without using SNR has not yet achieved resistance to low pass filtering, additive noise, noise 
removal [5]. From the experimental results, authors [6] have proved the watermark embedding does not make 
quite a bad audio signal, but there has to be another improvement on the part of the extracted image. In [7], 
it has a high resistance to attack by the LWT method, but the drawback is that the resulting SNR has not 
fulfilled the desire, just approached it. Based on many research results, SVD has been proven effective in 
digital watermarking, including in image watermarking techniques [8, 9] and audio [10-13]. 

Ozer et al. [10] proposed an audio watermarking scheme using short-time fourier transform (STFT) 
based on SVD. In this method, the system is difficult to detect watermarks that are attacked by simple noise. 
Bhat et al. [11] introduced the audio watermarking method based on DWT and quantization index 
modulation (QIM). The technique is strong against several attacks. However, the resulting signal to noise 
ratio (SNR) is slightly above 20 dB. The authors [11, 12] proposed an audio watermarking scheme in 
transform domain and based on SVD. The system designs chaotic sequences on binary watermarks to 
increase confidentiality. Al-Nuaimy et al. [13] applies the watermarking method to the Bluetooth-based 
automatic speaker identification system. However, improvements in endurance are needed. An audio 
watermark has good resistance and good SNR values when the DWT and DCT methods are combined, and 
the comparison with LWT-DCT-SVD is not better than the DWT-DCT method, the shortcomings of 
the paper are the incremental bit synchronization and CS processes on the watermark still an error when 
it is being extracted [14]. In subsequent papers, the proposed scheme produces robust resistance to hybrid 
attacks and desynchronization, but its shortcomings still have many imperfections [15]. In [16], the proposed 
scheme has the robustness to volumetric scaling attacks which is a crucial drawback of the conventional 
QIM-based watermarking algorithm. The scheme also sufficiently resistant to common methods of signal 
processing as attested by the results of tests in section IV. In addition, the extractor of this method does not 
require the original audio signal for watermark extraction. The paper also try to improve the resistance of 
the method by improving the adaptive arrangement of the efficiency coefficients and also by increasing 
the synchronization of the extractor in the watermark extraction [17]. 

In this paper, LWT-DCT-SVD method is selected in order to get a better result of the watermark. 
So, the performance of audio watermarking will be improved such as audio quality improvement, higher 
robustness to the attack, and higher capacity. The watermarking process is divided into two stages that are 
embedding and extracting stage. The embedding stage is for inserting the watermark bits on the audio host. 
The first process is to read the audio host signal that will through an combined with synchronization bits. 
Then apply the LWT process is applied for a subband frequency selection. After that, the DCT process is 
applied with the frequency subband where it is transforming the audio host signal with a selected frequency 
from a time domain to a frequency domain. The output of the DCT process will apply an SVD process where 
the result is a manipulation matrix there are S,U, and V matrix. U and V matrix will be forwarded to SVD 
Reconstruction, while the S matrix will be modified by watermark for embedding process. At QIM, S matrix 
will be inserted with the watermark, previously through a converting process of two-dimensional matrix 
process that contained in the watermarks to one dimension in the pre-processing. After that, the acquisition 
of compressive sampling is applied from an original matrix to the smaller matrix. And the extraction stage 
is to take watermark bits on the watermarked audio. In the extraction stage, watermarked audio is read 
and the detection synchronization bit is applied and the result will be sent to the next process is the same 
as the embedding process until getting S$ matrix from the SVD process. Next, the Ê matrix from SVD 
process will be extracted by QIM extraction process. After that, the CS reconstruction process produce 
compressed watermark. 

This paper are described as follows: section 2 describes a basic theory of audio watermarking and 
the method embedding, section 3 describes an audio watermarking model, the embedding and the extraction 
process, section 4 describes the result and analysis of several performance parameters, while the conclusion 
describes in section 5. 


2. RESEARCH METHOD 
2.1. Lifting wavelet transform (LWT) 

Wavelet transform is a linear transformation that almost similar to the Fourier transform, with one 
important difference: wavelet transform allows the placement of time within the components of different 
frequencies from the signal provided. Wavelet transform is the time-frequency decomposition method. 
Lifting wavelet transform is one of type from wavelet transform [2]. The lifting scheme is a simple method 
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to adjust the design of biorthogonal wavelet. Lifting Wavelet Transform process is divided into three steps, 
there are: 


2.1.1. Split 


Splitting the data in LWT has a process as follow split the data into two smaller subsets (detail) 
there are e;_,for even sequence and O;_, for odd sequence. The signal is: 


S; = {S;} (1) 


the length of each subset is half of the Original subset. The split process in LWT is calculated as follows: 


(sj)= (ej Oj) (2) 
ej-1 = {8-1 = Sj.2k} (3) 
Oj-1 = {O;-1,k = Sj,2k + 1} (4) 


the formula for ILWT is used to process returns the signal as the original signal. The equation of split process 
in ILWT is as follow: 


Sj- = U(dj-1) (5) 


2.1.2. Prediction 

Prediction step is to predict the subset based on the local correlation in the original data. Predict is to 
use the even and odd sequence. Then, replace the detail as the difference between data and prediction. 
If predictions are reasonable, the differentials in the data (d;_,) will be small, and that contains information 
much less than the original subsets(O;_,). The prediction process is calculated as follow: 


dj_, = Oj-, — p(e-1) (6) 


with (Pk) is the anticipation function that can be expressed prediction operator (p). The function of (Pk) can 
choose the corresponding data (e;_,). The prediction process in ILWT is calculated as follow: 


dj_4 = P(S;_1) (7) 


2.1.3. Update 

Update is applied after prediction steps. Update and maintain some global properties of data with 
original data. Update is one of the average value subset produce features may not be same with the original 
data. So, it takes a process of renewal to maintain the characteristics of the original data. Therefore, we apply 
the update process. Update process is calculated as follows: 


Sj-4 = Cj-1 =+ U(d;_1), (8) 


with S;_, is the low frequency of Sj. The results in the constructed wavelet transform are also different. 
The formula of prediction process in ILWT can be defined as follows: 


S; = Merge(S;_1,4;-1), (9) 
2.2. Discrete cosine transform (DCT) 

DCT is a transformed function which very popular used in signal processing. DCT transform 
the signal from the time domain to the frequency domain and is able to show fragments of audio signals in 


the summary of the cosine function in different frequencies. DCT is used to convert the data into a sum of 
cosine wave tray of different frequencies [18]. DCT process can be defined as follows: 


x(k) = w(k) 3425 x(n) cos (ee ck = 0,1,....N— 1 (10) 
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1 
VN’ 


w(k) = ; (11) 
È k = 1,2,...,N— 1 


with N is the number of samples and x(n) of the audio signal. DCT can reduce distortion to the signal 
watermarking because DCT has a feature that has an energy buffer on some samples signal. The formula of 
IDCT process is defined by formula (12). 


k2nn 


X(n) = ENZ wX Ck) cos (12) 
with k = 1,2,3, ... N — 1, X(k) is DCT result, and w(k) = 1 for k = 0 and w(k) = 2 for k = 1,2,3,...N — 1. 


2.3. Singular value decomposition (SVD) 

Singular Value Decomposition is a method that used to decompose a matrix. For example, we have 
A matrix for the SVD process. A matrix is decomposed into three components, there are two orthogonal 
matrix and a diagonal matrix containing singular values. There are some main properties of SVD. The first, 
singular value on the audio has good stability so that when the audio is given slightly attacks, the singular 
value in the audio does not change significantly. The second, the audio quality will not be affected even 
though there was a little change to the singular value. And the last, singular value serves to represent 
the properties of the audio. There are several stages for SVD general properties such as transpose, flip, 
rotation, scaling [19]. Given any m x n matrix A. Algorithm to find matrics U, V and S is calculated using 


A=USV? (13) 


with Umxn as the orthogonal matrix, S,,, as diagonal matrix and V,,,, as orthonormal matrix [20]. 
The Inverse Singular Value Decomposition (ISVD) can be defined as follows : 


Ate SU taVs (14) 


2.4. Compressive sampling (CS) 

Compressive Sampling is a method of compression to take the least amount of random sample 
followed by the projection transformation process. Basically, this technique is a way to simplify the signal. 
An example of when we have signals that are millions of bytes we can zoom out to only a few hundred 
kilobytes. CS can reconstruct a signal with using a number of random measurement is called the sampling 
matrix and signal should be rare [21]. A signal using a number of random measurements called matrix 
sensing and the signal is rare. A signal x € R™ is k-sparse when all elements of k and x are non-zero. For y is 
a calculation vector, and A is a sampling matrix of M x N, then the embedding process is calculated using 
the following formula: 


y= Axx. (15) 


Basically, CS consists of two components: the recoverability and stability. Recoverability discusses 
about the type of measurement matrix and recovery procedures. Its function is to ensure the restoration of 
the right of all signals which are rare-k (k-nonzeros) and to ensure the appropriate measurement of how much 
to ensure recovery. On the other hand, stability serves to address the issue of resilience in recovery and while 
it is the measurement of the noise and/or improper sparsity [22]. The subject of the stability of CS learn 
issues of how accurate could restore approach CS signal in such situations. The results of the stability that 
has been set for the model l} minimization can be defined as follows [22]: 


= argmin||x||, (16) 
with y = A x x. Signs || || is the norm or length of the elements contained therein, or if || x || as in equation 
means the norm of element x of a normed vector space. Where the value of x is a matrix length generated 


after the extraction process. The output of this signal is the minimum value of the sparse signal generated 
from the previous calculation [22]. 
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2.5. Quantization index modulation (QIM) 

Quantization index modulation (QIM) is an efficient method of computing watermarking with 
additional information. This method can be applied to the time or frequency domain, or after 
the transformation process. There are two stages of the QIM embedding watermarking. The first, modulate 
the watermark in the index that the result is called a quantizer. The second, quantify signal hosted on 
a specific frequency with quantizer value that matches the watermark that embedded into the host signal [23]. 


2.6. Spread spectrum 

Spread spectrum is a communication method which communication signals are distributed across 
the available frequency spectrum, for spread information signals over a wider bandwidth to prevent 
information interruptions. The term spread spectrum is used because on this system the transmitted signal has 
a bandwidth that wider than the bandwidth of the information signal. SS on watermarking, The embedded 
watermark information linearly adds the modulated SS sequence to the host's audio signal [24]. 

The collaboration between LWT, DCT, and SVD as the pre-processing for a host audio is 
an effective transform method to yield the signal with strong energy. This will impact a very good medium 
for hiding the information by QIM method. In the same time, the watermark is compressed by CS before 
hidden by QIM method, thus the watermark payload will increase. The SS method in this paper is used to add 
the ability of the audio watermarking method to be robust to delay attack. 


3. PROPOSED ALGORITHM 

In the proposed audio watermarking technique, the system divided into 3 processes, such as: 
watermark, embedding, and extraction. Watermark is data that inserted to audio host. Embedding is 
a watermark insertion process into an audio signal host. Extraction is information retrieval process. 
The process of the three stages is explained in this section. 


3.1. Processing stage for watermark 
The embedding process is presented as follows: 
Step 1 : Read a watermark which is an image matrix Wmxn- 
Step 2 : Convert a matrix from the two-dimensional matrix into one dimension at pre-proccessing, so that 
it produces w(n). 
Step 3. : Apply the compressive sampling acquisition process of one-dimension matrix into the smaller size 
matrix. Using (15), it produces w,(n). 


3.2. Embedding stage 
The proposed watermark embedding process is shown in Figure 1. The embedding process is 

presented as follows: 

Step 1 : Read of signal audio host (x(7)) into the matrix of one dimension. 

Step 2. : Add synchronize bits to audio host. This process uses Spread Spectrum method. 

Step 3. : Apply the LWT process to define a frequency point (x,(n)). Calculate using formula (2), (3), (4), 
(6), and (8). 

Step 4 : Apply the DCT process, where the process of transformation from a time domain to frequency 
domain (X,(k)). Calculate using formula (10) and (11). 

Step 5: Apply SVD process, with the X,(k) decomposed become S, U, and V matrix. The U and V matrix 
forwarded to SVD Reconstruction, while the S Matrix do the QIM process. Calculate using 
formula (13) 

Step 6 : Embed the watermark that already through several stages into S matrix, then the result produces 


the matrix that has been combined S matrix. 


Step 7: Apply the ISVD process for mix the matrix U V and S , then the result produces X,,(k). 

Step 8 : Apply the IDCT process, where the transformation process to convert the frequency domain 
signal Xp (k) to the time domain, then the result produces X, (n). 

Step 9 : Apply the ILWT process to restore the signal as the original signal. The result of the process is 
an audio signal that has been watermarked, watermarked audio x(n). 

Step 10 : Combine the Synchronization signals with the audio host. 

Step 11 : Calculate the Signal to Noise Ratio (SNR) and Objective Different Grade after getting the X(7). 
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Figure 1. Embedding process 


3.3. Extraction stage 
The proposed watermark extraction process is shown in Figure 2. The extraction process 

is presented as follows: 

Step 1 : Read of watermarked audio £(n) into the matrix of one dimension. 

Step2 : Apply detection synchronization bit to prevent error on the embedding. 

Step2 : Apply the LWT process to define a frequency point. Calculate using formula (5), (7), and (9), then 
the result produces %5 (n) . 

Step3 : Apply the DCT process, same as the embedding process. Calculate using formula (12). 

Step 4 : Apply the SVD process. where the result is a manipulation become matrix S, U and V. Matrix U 
and V forwarded to SVD Reconstruction, while the S matrix modified by watermark. Calculate 
using formula (14) 

Step5 : Take the S matrix to extract the watermark bit w.b(n). 

Step6 : Apply the CS reconstruction process W} (n). 

Step7 : The process of converting bits from one dimension to two dimensions in pre-processing, then get 
the watermark W(n). 

Step 8 : Perform calculations for the value of Bit Error Rate (BER). 









Synchronization 


Watermarked Audio 
> . 
Detection 


Watermark & Audio 
—— ——— 





CS 
Reconstruction 





Figure 2. Extraction process 


The extraction process is presented as follows: 

Step 1 : Read of watermarked audio X(n) into the matrix of one dimension. 

Step 2. : Apply detection synchronization bit to prevent error on the embedding. 

Step 2. : Apply the LWT process to define a frequency point. Calculate using formula (5), (7), and (9), 
then the result produces Xx, (n) . 

Step 3. : Apply the DCT process, same as the embedding process. Calculate using formula (12). 

Step 4 : Apply the SVD process. where the result is a manipulation become matrix S, U and V. Matrix U 
and V forwarded to SVD Reconstruction, while the S matrix modified by watermark. Calculate 
using formula (14) 

Step 5 : Take the S matrix to extract the watermark bit w.b(n). 

Step 6 : Apply the CS reconstruction process W} (n). 

Step 7 : The process of converting bits from one dimension to two dimensions in pre-processing, then get 
the watermark W(n). 

Step 8 : Perform calculations for the value of Bit Error Rate (BER). 
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4. RESULT AND ANALYSIS 

This section explains the results of CS performances on watermark, audio watermarking scheme 
performance before and after attacks, and the comparison with another researches. The analysis results can be 
seen from the value of BER, SNR and ODG using MATLAB simulation. The audio files used in this 
watermarking system are *.wav files with a sampling frequency of 44100 Hz and stereo sound channels. 
The audio used is piano, guitar, bass, drums and conversational sounds accompanied by a little music. 


4.1. CS performance on watermark 

The system is tested by using different watermark sides and different compression ratios. 
Then the best compression ratio chosen to be used in the next test scenario. This study uses a watermark in a 
binary image with a size of 10 x 10 pixels. Figure 3 shows the watermark image used in the system. Table 1 
displays the results of CS testing. 





Figure 3. Watermark image 


Based on Table 1, the greater the compression ratio, the faster the time required. Based on Figure 4, 
the higher the pixel value, the better the image quality will be. If the pixel value is large and compressed with 
a small bit compression ratio, then the embedded watermark will be greater. In this study, the 32-bit side and 
0.03 compression ratio are used, so that the watermark entered is smaller, so the time taken is smaller. Thus, 
the greater the value of the bit and pixel compression ratio used, the better the quality of the resulting 
watermark extraction. Figure 4 shows the watermarks result with different ratio compression. 


Figure 4. Watermark image with ratio compression 0.025 to 0.03 





Table 1. CS performance with 32 pixel 


Bit Ratio Compression Compression ratio Elapsed time (s) 
0.025 75% 19.349703 
0.026 87.5% 8.330482 
0.027 87.5% 8.815905 
0.028 87.5% 12.078903 
0.029 87.50% 8.158173 
0.03 100.00% 6.774774 


4.2. System performance before attack 

There are some parameters in the systems, such as: frame length (Nframe), quantization bit depth 
(nbi, subband threshold (thr), and wavelet decomposition level (N). All parameters are optimized to reach 
optimal performance. Optimal performance in watermarking is how to find parameter values which yield 
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balance performance between imperceptibility parameter (SNR/ODG), robustness parameter (BER), 
and watermark payload or capacity (C). 

Figure 5 displays the trade-off between SNR, C and BER with variable sample number per frame 
(Nframe) parameter in range [128, 256, 512, 1024]. SNR is displayed with y-axis range 62-80 DB, BER is 
displayed with y-axis range 0-0.3, and payload is displayed with y-axis range 0-350 bps. The figure shows 
the watermarking trade-off that when SNR is getting better or higher, then BER is getting worse or higher, 
and the payload is getting worse also. Table 2 shows the range of optimized parameters and the optimal 
parameters as the optimization process result. 








350 80 m g 0.3 
== SNR (dB) 
—#— BER ` 
78 == Payload 
300 
-7 0.25 
76 
250 
74 + 0.2 
200 72 
© a 0.15 
150% 79 
68 g 0.1 
100 
66 
- 0.05 
50 5 
64 
0 62 0 
0 200 400 600 800 1000 1200 


Sample Number per Frame 


Figure 5. Imperceptibility, robustness and payload parameter trade-off 


Table 2. Performance result of optimal parameters 


N Nframe  Nbit Thr ODG SNR BER C 
Range 1-4 128-1024 1-10  0.9,0.1,0.00001 
Optimal Parameters 1 256 2 0.9 -0.0498 53.8171 0 172.26 


4.3. Performance system after attacks 

There are 5 different genre of songs that tested with five attacks. The optimization process is to 
obtain BER, SNR, ODG and C with all parameters that are changed. The optimal parameters are found when 
getting the highest SNR, the lowest BER, and the highest payload. This optimization process is a mandatory 
step to do in an audio watermarking because the imperceptibility, robustness, and watermark are not directly 
proportional to each other. We usually call it as a trade-off. If the imperceptibility parameter is getting better, 
then the robustness parameter and capacity is even worse, and this applies otherwise. Table 3 is the average 
yield resistance (BER) of the best parameter test using various attacks. Table 4 shows the performance 
system after being given attack using the optimal parameter displayed in Table 2. Table 4 also displays 
the performance comparison with the previous research. Balance performance means the standard 
performance for an audio watermarking, such as SNR must be more than 20 dB, ODG should be more 
than -1, BER should be less than 20%, and watermark payload can be adjusted to as high as possible when it 
is possible. 
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Table 3. Performance system after attacks 
Watermark 


Attacks 


Fs 


BER (After 


Watermark 


Attacks Speed 


BER (After 





LPF 


BPF 


Resampling 


MP3 Compression 


Time Scale 
mee LPF 6k 
[16] NA 
[25] 6 
[26] 0-6 
Proposed 0 


3000 


6000 


9000 


100-6k 


50-6k 


25-6k 


11025 


16000 


22050 


32k 


64k 


128k 


192k 


1% 


2% 


4% 


5. CONCLUSION 

SVD based robust audio watermarking with compressive sampling framework on the watermark has 
been proposed. In this scheme, compressive sampling is used to compress the watermark so that more 
information can be inserted into the audio host. SVD The system is able to produce high resilience with 
BER 0 and capacity 127.26. The best parameters for this work are 256 frame length, 2 quantization bits, and 
16 bit depth, 0.003 alfa, threshold 0.9 , and wavelet level is 1. The audio watermarking algorithm is higly 
robust againts LPF, resampling, and linear speed change attacks. 
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